Faster algorithms for string matching with k mismatches
نویسندگان
چکیده
The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T . Currently, the fastest algorithms for this problem are the following. The Galil–Giancarlo algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time O(n √ m logm). We present an algorithm that is faster than both. Our algorithm finds all locations where the pattern has at most k errors in time O(n √ k log k). We also show an algorithm that solves the above problem in time O((n+ (nk3)/m) log k). 2003 Elsevier Inc. All rights reserved.
منابع مشابه
Approximate Boyer-Moore String Matching
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the pro...
متن کاملA Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches
This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with kmismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in comp...
متن کاملString Matching with Mismatches by Real-Valued FFT
String matching with mismatches is a basic concept of information retrieval with some kinds of approximation. This paper proposes an FFT-based algorithm for the problem of string matching with mismatches, which computes an estimate with accuracy. The algorithm consists of FFT computations for binary vectors which can be computed faster than the computation for vectors of complex numbers. Theref...
متن کاملReduced Nondeterministic Finite Automata for Approximate String Matching
We will show how to reduce the number of states of nondeterministic nite automata for approximate string matching with k mismatches and nondeterministic nite automata for approximate string matching with k differences in the case when we do not need to know how many mismatches or di erences are in the found string. Also we will show impact of this reduction on Shift-Or based algorithms.
متن کاملOn string matching with k mismatches
In this paper we consider several variants of the pattern matching problem. In particular, we investigate the following problems: 1) Pattern matching with k mismatches; 2) Approximate counting of mismatches; and 3) Pattern matching with mismatches. The distance metric used is the Hamming distance. We present some novel algorithms and techniques for solving these problems. Both deterministic and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000